Extracting Prior Knowledge from Data Distribution to Migrate from Blind to Semi-Supervised Clustering

نویسندگان

  • R. Boostani Electrical & Computer Department, Shiraz University, Shiraz, Iran.
  • Z. Sedighi Electrical & Computer Department, Shiraz University, Shiraz, Iran.
چکیده مقاله:

Although many studies have been conducted to improve the clustering efficiency, most of the state-of-art schemes suffer from the lack of robustness and stability. This paper is aimed at proposing an efficient approach to elicit prior knowledge in terms of must-link and cannot-link from the estimated distribution of raw data in order to convert a blind clustering problem into a semi-supervised one. To estimate the density distribution of data, Wiebull Mixture Model (WMM) is utilized due to its high flexibility. Another contribution of this study is to propose a new hill and valley seeking algorithm to find the constraints for semi-supervise algorithm. It is assumed that each density peak stands on a cluster center; therefore, neighbor samples of each center are considered as must-link samples while the near centroid samples belonging to different clusters are considered as cannot-link ones. The proposed approach is applied to a standard image dataset (designed for clustering evaluation) along with some UCI datasets. The achieved results on both databases demonstrate the superiority of the proposed method compared to the conventional clustering methods.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

from linguistics to literature: a linguistic approach to the study of linguistic deviations in the turkish divan of shahriar

chapter i provides an overview of structural linguistics and touches upon the saussurean dichotomies with the final goal of exploring their relevance to the stylistic studies of literature. to provide evidence for the singificance of the study, chapter ii deals with the controversial issue of linguistics and literature, and presents opposing views which, at the same time, have been central to t...

15 صفحه اول

Semi-Supervised Clustering with Limited Background Knowledge

In many machine learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate. Consequently, semi-supervised learning, learning from a combination of both labeled and unlabeled data, has become a topic of significant recent interest. Our research focus is on semi-supervised clustering, which uses a small amount of supervised data in the...

متن کامل

Extracting Knowledge from Incomplete Data

Decision-makers are often met with situations where optimal decisions have to be made in the presence of missing information. To facilitate such work we propose application of Armstrong axioms. Key–Words: Decision support systems, uncertainty management, inference axioms

متن کامل

Semi-supervised Pattern Learning for Extracting Relations from Bioscience Texts

A variety of pattern-based methods have been exploited to extract biological relations from literatures. Many of them require significant domain-specific knowledge to build the patterns by hand, or a large amount of labeled data to learn the patterns automatically. In this paper, a semisupervised model is presented to combine both unlabeled and labeled data for the pattern learning procedure. F...

متن کامل

A Variational Approach to Semi-Supervised Clustering

We present a variational inference scheme for semi-supervised clustering in which data is supplemented with side information in the form of common labels. There is no mutual exclusion of classes assumption and samples are represented as a combinatorial mixture over multiple clusters. The method has other advantages such as the ability to find the most probable number of soft clusters in the dat...

متن کامل

Semi-supervised incremental clustering of categorical data

Résumé. Le clustering semi-supervisé combine l’apprentissage supervisé and non-supervisé pour produire meilleurs clusterings. Dans la phase initiale supervisée de l’algorithme, un échantillon d’apprentissage est produit par selection aléatoire. On suppose que les exemples de l’échantillon d’apprentissage sont étiquetés par un attribut de classe. Puis, un algorithme incrémentiel développé pour l...

متن کامل

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}


عنوان ژورنال

دوره 6  شماره 2

صفحات  287- 295

تاریخ انتشار 2018-07-01

با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.

میزبانی شده توسط پلتفرم ابری doprax.com

copyright © 2015-2023